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Abstract 

Network or graph structures are ubiquitous in the study of complex 
systems. Often, we are interested in complexity trends of these system 
as it evolves under some dynamic. An example might be looking at the 
complexity of a food web as species enter an ecosystem via migration or 
speciation, and leave via extinction. 

In this paper, a complexity measure of networks is proposed based on 
the complexity is information content paradigm. To apply this paradigm 
to any object, one must fix two things: a representation language, in which 
strings of symbols from some alphabet describe, or stand for the objects 
being considered; and a means of determining when two such descriptions 
refer to the same object. With these two things set, the information 
content of an object can be computed in principle from the number of 
equivalent descriptions describing a particular object. 

I propose a simple representation language for undirected graphs that 
can be encoded as a bitstring, and equivalence is a topological equivalence. 
I also present an algorithm for computing the complexity of an arbitrary 
undirected network. 

1 Introduction 

In |12) . I argue that information content provides an overarching complexity 
measure that connects the many and various complexity measures proposed 
(see |S] for a review). The idea is fairly simple. In most cases, there is an obvi- 
ous prefix-free representation language within which descriptions of the objects 
of interest can be encoded. There is also a classifier of descriptions that can 
determine if two descriptions correspond to the same object. This classifier is 
commonly called the observer, denoted 0{x). 

To compute the complexity of some object x, count the number of equivalent 
descriptions a;(^, x) = of length I that map to the object x under the agreed 
classifier. Then the complexity of x is given in the limit as £ — > c»: 

C{x) = lim nogN ~\oguj{i,x) (1) 
where N is the size of the alphabet used for the representation language. 
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Because the representation language is prefix-free, every description y in 
that language has a unique prefix of length s{y). The classifier does not care 
what symbols appear after this unique prefix. Hence ui{£,0{y)) > A^^~'*to), 
< C{0{y)) < s{y) and so equation converges. 

The relationship of this algorithmic complexity measure to more famil- 
iar measures such as Kolmogorov (KCS) complexity, is given by the coding 
theorem|3 Thm 4.3.3]. Equation (QJ corresponds to the logarithm of the uni- 
versal a priori probability. The difference between these measures is bounded 
by a constant independent of the complexity of x. 

Many measures of network properties have been proposed, starting with 
node count and connectivity (no. of links), and passing in no particular order 
through cyclomatic number (no. of independent loops), spanning height (or 
width), no. of spanning trees, distribution of links per node and so on. Graphs 
tend to be classified using these measures — small world graphs tend to have 
small spanning height relative to the number of nodes and scale free networks 
exhibit a power law distribution of node link count. 

Some of these measures are related to graph complexity, for example node 
count and connectivity can be argued to be lower and upper bounds of the 
network complexity respectively. However, none of the proposed measures gives 
a theoretically satisfactory complexity measure, which in any case is context 
dependent (ie dependent on the observer O, and the representation language). 

In this paper we shall consider only undirected graphs, however the extension 
of this work to directed graphs should not pose too great a problem. In setting 
the classifier function, we assume that only the graph's topology counts — 
positions, and labels of nodes and links are not considered important. Clearly, 
this is not appropriate for all applications, for instance in food web theory, the 
interaction strengths (and signs) labeling each link is crucially important. 

The issue of representation language, however is far more problematic. In 
some cases, eg with genetic regulatory networks, there may be a clear represen- 
tation language, but for many cases there is no uniquely identifiable language. 
However, the invariance theorem^, Thm 2.1.1] states that the difference in com- 
plexity determined by two different Turing complete representation languages 
(each of which is determined by a universal Turing machine) is at most a con- 
stant, independent of the objects being measured. Thus, in some sense it does 
not matter what representation one picks — one is free to pick a representa- 
tion that is convenient, however one must take care with non Turing complete 
representations. 

In the next section, I will present a concrete graph description language that 
can be represented as binary strings, and is amenable to analysis. The quantity 
w in eq J^l can be simply computed from the size of the automorphism group, 
for which computationally feasible algorithms exist |1U|. 

The notion of complexity presented in this paper naturally marries with 
thermodynamic entropy 5'|S]: 

S'max = C S (2) 
where S'max is called potential entropy, ie the largest possible value that entropy 
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Network 



Bitstring description 



/. 

A 
Y 



1110100, 1110010, 1110001 



mono, 1110101, 1110011 



11110110100, 11110101010, 11110011001, 11110000111 



Table 1: A few example networks, with their bitstring descriptions 

can assume under the specified conditions. The interest here is that a dynamical 
process updating network hnks can be viewed as a dissipative system, with 
links being made and broken corresponding to a thermodynamic flux. It would 
be interesting to see if such processes behave according the maximum entropy 
production principle |S] or the minimum entropy production principle |11|. 

In artificial life, the issue of complexity trend in evolution is extremely 
important[2]. 1 have explored the complexity of individual Tierran organisms|13[ 
I14| . which, if anything, shows a trend to simpler organisms. However, it is en- 
tirely plausible that complexity growth takes place in the network of ecological 
interactions between individuals. For example, in the evolution of the eukary- 
otic cell, mitochondria are simpler entities than the free-living bacteria they were 
supposedly descended. A computationally feasible measure of network complex- 
ity is an important prerequisite for further studies of evolutionary complexity 
trends. 

2 Representation Language 

One very simple implementation language for undirected graphs is to label the 
nodes l..iV, and the links by the pair {i,j),i < j of nodes that the links connect. 
The linklist can be represented simply by a N{N — l)/2 length bitstring, where 
the — 1) -|-«th position is 1 if link is present, and otherwise. We also 
need to prepend the string with the value of N in order to make it prefix-free 
— the simplest approach is to interpret the number of leading Is as the number 
N, which adds a term iV -f 1 to the measured complexity. 

Some example 3 and 4 node networks are shown in table ^ One can see 
how several descriptions correspond to the same topological network, but with 
different node numberings. 

A few other properties are also apparent. A network A that has a link 
wherever B doesn't, and vice-versa might be called a complement of B. A 
bitstring for A can be found by inverting the Is and Os in the linklist part of 
the network description. Obviously, io{A, L) = uj{B, L). 

The empty network, and the fully connected network have linklists that are 
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5.42 



Table 2: Enumeration of all 3- node networks, with number of equivalent bit- 
strings (w) and complexity (C) 



Network 


Complement 
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11 


» A 
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8.42 
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'A 


12 


7.42 




N 
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9.42 
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same 


12 


7.42 






4 


9 



Table 3: Enumeration of all 4- node networks, with number of equivalent bit- 
strings (w) and complexity (C) 

all Os or Is. These networks are maximally complex at 

C=^N{N + 1) + 1 (3) 

bits. This, perhaps surprising feature, is partly a consequence of the definition 
we're using for network equivalence. If instead we ignored unconnected nodes 
(say we had an infinite number of nodes, but a only a finite number of them 
connected into a network), then the empty network would have extremely low 
complexity, as one would need to sum up the ojs for TV = 0,1,.... But in 
this case, there would no longer be any symmetry between a network and its 
complement. 

It is also a consequence of not using a Turing complete representation lan- 
guage. Empty and full networks are highly compressible, therefore we'd expect a 
Turing complete representation language would be able to represent the network 
in a compressed form, lowering the measured complexity. 

Networks of 3 nodes and 4 nodes are sufiiciently simple that it is possible 
enumerate all possibilities by hand. It is possible to numerically enumerate 



4 



larger networks using a computer, however one will rapidly run into diminishing 
returns, as the number of bitstrings to consider grows as 22^(^-1). I have done 
this up to 8 nodes, as shown in Fig. ^ 

3 Computing uj 

The first problem to be solved is how to determine if two network descriptions 
in fact correspond to the same network. We borrow a trick from the field of 
symbolic computing, which is to say we arrange a canonical labeling of the 
nodes, and then compare the canonical forms of each description. Brendan 
McKay JO] has solved the problem of finding canonical labelings of arbitrary 
graphs, and supplies a convenient software library called nauty^ that implements 
the algorithm. 

The number of possible distinct descriptions is given by A'^! (the number of 
possible renumberings of the nodes) , divided by the number of such renumberings 
that reproduce the canonical form. As a stroke of good fortune, nauty reports 
this value as the order of the automorphism group, and is quite capable of 
computing this value for networks with 10s of thousands of nodes within seconds 
on modern CPUs. So the complexity value C in equation is computationally 
feasible, with this particular choice of representation. 

4 Compressed complexity and OfFdiagonal com- 
plexity 

I have already mentioned the issue of non Turing completeness of the proposed 
bitstring representation of a network. This has its most profound effect for 
regular networks, such as the empty or full networks, where C is at a maximum, 
yet contained a great deal of redundancy in the expression. To get a handle 
on how much difference this might make, we can try a compression algorithm 
of the all the equivalent bitstring representations, choosing the length most 
compressed representation as a new measure I call zcomplexity. Inspired by the 
brilliant use of standard compression programs (gzip, bzip2, Winzip etc.) to 
classify texts written in an unknown language^, I initially thought to use one of 
these compression libraries. However, all of the usually open source compression 
libraries were optimised for compressing large computer files, and typically had 
around 100 bits of overhead. Since the complexities of all networks studied 
here are less than around 50 bits, this overhead precludes the use of standard 
techniques. 

So I developed my own compression routine, based around run length en- 
coding, one of the simplest compression techniques. The encoding is sim- 
ple to explain: Firstly a "wordsize" w is chosen such that log2 N < w < 
log2 iV + log2(Af — 1) — 1. Then the representation consists of w 1 bits, fol- 
lowed by a zero, then w bits encoding TV, then the compressed sequence of 

^Available from http://cs.anu.edu.au/~bdm/nauty 
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links. Repeat sequences are represented by a pair of w bit words, which give the 
repeat count and length of a sequence, followed by the sequence to be repeated. 
As an example, the network: 

1111110101010101010101 

can be compressed to 

DIO^^MO^^. 

w N rpt Icn soq 

Here 000 represents 8, not 0, as a zero repeat count makes no sense! Also, 
since the original representation is prefix free, the extra that the compressed 
sequence adds to the original is ignored. 

By analogy with equation define zcomplexity as 

= 1 + log2 2" """{«(^)'^(^+i)/2+i> (4) 

b 

where b iterates over all bitstring representations of the network we're measur- 
ing, and is the compressed length of h, using the best w, by the aforemen- 
tioned compression algorithm. The extra 1 takes into account a bit used to in- 
dicate whether the compressed or uncompressed sequence is used, so Cz < C+1. 

The optimal w for the empty (or full) network w — \\0g2 , and zcomplex- 
ity can be readily computed as 

,iV(7V-l), 



2w+i I 

r 2 + 31og2 7V + ^ if n = 2™ 

- j 5 + 31og2(A^-.)+r^J^l if n = 2-i + .s 

Compared with equation (PJ, we can see that it makes a substantial difference. 
Already at = 5, = 13 and C — 16, with the difference increasing with N . 

To compute the zcomplexity for an arbitrary graph, we need to iterate over 
all possible bit representations of a graph. There are two obvious ways to do 
this, since the number of nodes A, and number of links / are identical in all 
representations: 

• Start with the bitstring with the initial / bits of the linkfield set to 1, 
and the remaining bits 0. Then iterate over all permutations, summing 
the right hand term into a bin indexed by the canonical representation 
of the network. This algorithm computes Cz for all networks of N nodes 

and I links. This algorithm has complexity ^ ^ ^ _ N{N — 

l)...{N-l)/l\ 

• Take the network, and iterate over all permutations of the node labels. 
Some of these permutations will have identical bitstring representations 
as others — as each bitstring is found, store it in a set to avoid double 
counting. This algorithm has complexity A! 
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Figure 1: Cz plotted against C for all networks of order 8. Note the empty/full 
network lying in the lower right hand corner 



In my experiments, I calculate zcomplexity for all networks with linkcount I such 
/ N(N — l)/2 \ 

that ( / ) ^^^^ sample randomly networks with greater link 

counts. 

Fig. ^shows Cz plotted against C for all networks of order 8, which is about 
the largest size network for which an exhaustive computation of Cz is feasible. 

Unfortunately, without a smarter way of being able to iterate over equivalent 
bitstring representations, zcomplexity is not a feasible measure, even it more 
accurately represents complexity. The disparity between Cz and C is greatest 
for highly structured graphs, so it would be interest to know when we can use 
C, and when a more detailed calculation is needed. 

Claussen^ introduced a measure he calls offdiagonal complexity, which mea- 
sures the entropy of the distribution of links between different node degree. 
Regular graphs will have zero offdiagonal complexity, as the node degree dis- 
tribution is sharply peaked, and takes on moderate values for random graphs 
(where node degree distribution is roughly exponential) and is extremal for 
scale- free graphs. Since the discrepancy between C and Cz was most pronounced 
with regular graphs, I looked at offdiagonal complexity as a predictor for this 
discrepancy. 
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Figure 2: Compression error as a function of C and off diagonal complexity for 
networks with 10 nodes. All networks with link count less than 7 were evaluated 
by method 1, and 740 graphs with more than 7 links were selected at random, 
and computed using method 2. The separation between the two groups is due 
to compressibility of sparse networks. 
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Figure 3: Compression error as a function of off diagonal complexity. A least 
squares linear fit is also shown 

Figure 12 shows the compression error (defined as ^--^) plotted as a func- 
tion of offdiagonal complexity and C. The dataset falls clearly into two groups 
— all sparse networks with link count less than 7, and those graphs sampled 
randomly, corresponding to the two different methods mentioned above. The 
sparse networks are expected to be fairly regular, hence have high compression 
error, whereas randomly selected networks are most likely to be incompressible, 
hence have low compression error. 

Figure 01 shows the results of a linear regression analysis on offdiagonal com- 
plexity with compression error. The correlation coefficient is -0.87. So clearly 
offdiagonal complexity is correlated (negatively) with compression error, much 
as we expected, however it is not apparently a good test for indicating if the 
compression error is large. A better distinguishing characteristic is if C is greater 
than the mean random C (which can be feasibly calculated) by about 3-4 bits. 
What remains to be done is to look at networks generated by a dynamical pro- 
cess, for example Erdos-Renyi random graphs or Barabasi- Albert preferential 
attachment ^ to see if they fill in the gap between regular and algorithmically 
random graphs. 
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5 Conclusion 



In this paper, a simple representation language for iV-node undirected graphs is 
given. An algorithm is presented for computing the complexity of such graphs, 
and the difference between this measure, and one based on a Turing complete 
representation language is estimated. For most graphs, the measure presented 
here computes complexity correctly, only graphs with a great deal of regularity 
are overestimated. 

A code implementing this algorithm is implemented in CH — h library, and 
is available from version 4.D17 onwards as part of the ^'^'Lab system, an open 
source modelling framework hosted at http://ecolab.sourceforge.net 

Obviously, undirected graphs is simply the start of this work — it can be 
readily generalised to directed graphs, and labeled graphs such as food webs 
(although if the edges a labeled by a real value, some form of discretisation of 
labels would be needed). 

Furthermore, most interest is in complexities of networks generated by dy- 
namical processes, particularly evolutionary processes. Some of the first pro- 
cesses that should be examined are the classic Erdos-Renyi random graphs and 
Barabasi- Albert preferential attachment. 
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